-
Notifications
You must be signed in to change notification settings - Fork 43
Implement Status Reporting for EtcdCluster #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ballista01 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @ballista01. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
7b62fb3
to
1081734
Compare
// attempt processing again later. This could have been caused by a | ||
// temporary network failure, or any other transient reason. | ||
logger.Error(err, "Failed to get StatefulSet. Requesting requeue") | ||
etcdCluster.Status.Phase = "Failed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: In a scenario where the StatefulSets are already created but the getStatefulSet
failed, having the etcdCluster.Status.Phase
as Failed
might seem misleading, wdyt?
Maybe Unknown
makes more sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that Failed
is misleading. However, Unknown
is ambiguous to the user. How about Degraded
? Anyway the user can find detailed info in the conditions
field. The etcdCluster.Status.Phase
is now a high level summary based on conditions
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, Degraded
is better in this case.
I was wondering if we should also keep a separate condition Unhealthy
to have distinction between Degraded
(when Get/Create response fails) and Unhealthy
(when HealthCheck fails)
api/v1alpha1/etcdcluster_types.go
Outdated
// ReadyReplicas is the number of pods targeted by this EtcdCluster with a Ready condition. | ||
ReadyReplicas int32 `json:"readyReplicas,omitempty"` | ||
// Members is the number of etcd members in the cluster reported by etcd API. | ||
Members int32 `json:"members,omitempty"` | ||
// CurrentVersion is the version of the etcd cluster. | ||
CurrentVersion string `json:"currentVersion,omitempty"` | ||
// Phase indicates the state of the EtcdCluster. | ||
Phase string `json:"phase,omitempty"` | ||
// Conditions represent the latest available observations of a replica set's state. | ||
// +optional | ||
// +patchMergeKey=type | ||
// +patchStrategy=merge | ||
// +listType=atomic | ||
Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to get consensus on the field list first. I think we need to record each member's status. cc @ivanvc @jmhbnz @jberkus @neolit123 @hakman @justinsb @ArkaSaha30
My draft thought:
- ReadyReplicas (similar to sts's ReadyReplicas)
- CurrentReplicas (similar to sts's CurrentReplicas)
- MemberCount (how many members in the etcd cluster)
- Members (a slice)
- name: etcd-1
healthy: true
Alarm/Error: nil
version: 3.5.21
storageVersion: 3.5.0
downgradeInfo: nil
dbSize: 5000000
dbSizeInUse: 1000000
- name: etcd-2
health: true
Alarm/Error: nil
version: 3.5.21
storageVersion: 3.5.0
downgradeInfo: nil
dbSize: 5000000
dbSizeInUse: 1000000
- name: etcd-3
health: true
Alarm/Error: nil
version: 3.5.21
storageVersion: 3.5.0
downgradeInfo: nil
dbSize: 5000000
dbSizeInUse: 1000000
- Conditions []metav1.Condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was working on this direction last weekend. Will update a new draft soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the members slice idea. It provides more fine-grained info. Would love to work on that if we agree on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't mind if we push for an api that covers a number of interesting fields for the initial versions of the operator, but i think we should probably draft a google doc with a proposal and for the next version just push out a very simple spec that has just the etcd version and maybe health: true/false
. note: even the health status needs some discussion of what it entails.
@ballista01 are you willing to draft such a api doc where we can rally discussions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@neolit123 Sure! I'll post the link in slack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the argument for the
dbSize: 5000000 dbSizeInUse: 1000000
... fields? Is there some way that cluster operators would use that info that requires it to be available via the kubernetes API, instead of just via etcdctl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dbSize: 5000000 dbSizeInUse: 1000000
etcd-operator can only get such info from etcd cluster via etcd SDK API (similar to etcdctl). One possible use case is automatic compact & defragmentation. But as mentioned, we can get such info from etcd cluster directly. So In general, for any info that we can get from etcd cluster directly might not go into the Etcdcluster.Status.
As we discussed in the community meeting, one action item for now:
- Investigate the best practice to populate the Status. For example, how K8s populates & use the Statefulset's Status. How other well-known operators populate & use the Status.
In the long-term, if we add any field/info into EtcdCluster.Status, we should have a clear goal first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm basically looking for use cases of "someone is going to use other Kubernetes components to automate this" which is a good justification for it being available via the Kubernetes API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion regarding to the status field:
status:
observedGeneration: 5 # NEW: Top-level field indicating the latest .metadata.generation processed by the controller
currentReplicas: 3 # Unchanged: Matches the StatefulSet replica count
currentVersion: 3.6.0-rc.3 # Unchanged: Represents the running etcd version of the cluster
leaderId: 975eab669fe7b560 # Unchanged: Leader member ID for status observability
memberCount: 3 # Unchanged: Total number of etcd members
readyReplicas: 3
# lastDefragTime could be optionally added if controller manages defragmentation
# lastDefragTime: "2025-07-20T12:00:00Z" # OPTIONAL: Cluster-level defragmentation completion time; not included in this PR
members:
- id: 22f4b6fa7639917a
name: test-cluster-2
version: 3.6.0-rc.3
isHealthy: true
isLearner: false # NEW: Added to support learner promotion logic and troubleshooting
# REMOVED: dbSize/dbSizeInUse → considered too dynamic; suggested to be exposed via /metrics
# clientURL and peerURL omitted here for minimal diagnostic view, can be re-added under a feature gate
- id: 52117ab448fed65d
name: test-cluster-1
version: 3.6.0-rc.3
isHealthy: true
isLearner: false
- id: 4f8c9e671cbd2233
name: test-cluster-0
version: 3.6.0-rc.3
isHealthy: true
isLearner: false
conditions:
- type: Available
status: "True"
reason: ClusterReady
message: Cluster is fully available and healthy
observedGeneration: 5
lastTransitionTime: "2025-07-02T18:52:06Z"
# Unchanged: Matches K8s convention for status.conditions
- type: Progressing
status: "False"
reason: ReconcileSuccess
message: Cluster reconciled to desired state
observedGeneration: 5
lastTransitionTime: "2025-07-02T18:52:06Z"
# Unchanged: Indicates that spec and observed state are aligned
- type: Degraded
status: "False"
reason: ClusterReady
message: Cluster is healthy
observedGeneration: 5
lastTransitionTime: "2025-07-02T18:50:57Z"
# Unchanged: Used to signal partial failures if any member is unhealthy
# Note: dbSize and dbSizeInUse fields are REMOVED in this revision.
# Rationale: These are volatile metrics not suitable for status fields;
# better exposed via Prometheus-compatible /metrics endpoint for observability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good. However, opening a discussion so that we can hammer out final fields list. Issue #182
b1b18ae
to
a3d5999
Compare
/ok-to-test |
Hi I know this PR is getting quite large, so I wanted to share my plan upfront. My main goal here is to use this as a tracker PR to run the full CI pipeline and give everyone a complete picture of the proposed changes. Once CI passes and we're happy with the overall direction, I'll split this up into smaller, sub-PRs that will be much easier to review and merge. I also noticed that after the foundational work, the current monolithic Reconcile function makes it tricky to write unit tests for the new status logic. So, my next major step will likely be to refactor the Reconcile loop into smaller testable helper methods. I figure this might be a common challenge, so if there are already any existing issues, design docs, or discussions about refactoring the reconciler, please let me know! I'd love to build on any existing work. I may start some exploratory work on the refactoring, but the final implementation in the smaller PRs will absolutely follow whatever we decide on as a community. |
c3e3b33
to
1ea2d50
Compare
It seems that with the new status & condition reporting feature, the cyclomatic complexity of |
1814dc1
to
9e75a09
Compare
…s integration, and etcd health improvements Signed-off-by: Wenxue Zhao <[email protected]>
9e75a09
to
678976d
Compare
Still working on reducing the complexity of status & condition reporting code in |
4abe7b3
to
d20b388
Compare
- add condition helpers for consistent updates - reconcile controller to patch status and observedGeneration each loop Signed-off-by: Wenxue Zhao <[email protected]>
d20b388
to
9de5dcf
Compare
@ballista01: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This PR introduces status reporting for the
EtcdCluster
custom resource (v1alpha1) as proposed in issue #135. It allows users to observe the actual state of the managed etcd cluster directly from theEtcdCluster
object. Original reconciliation logic is preserved.Changes Implemented:
api/v1alpha1/etcdcluster_types.go
):EtcdClusterStatus
struct.EtcdClusterStatus
:readyReplicas (int32)
members (int32)
phase (string)
conditions ([]metav1.Condition)
andCurrentVersion (string)
(TO-DO: Reporting logic)/status
subresource via the+kubebuilder:subresource:status
marker.api/v1alpha1/zz_generated.deepcopy.go
usingmake generate
.config/crd/bases/operator.etcd.io_etcdclusters.yaml
usingmake generate
/controller-gen crd
.internal/controller/etcdcluster_controller.go
):Reconcile
method to gather status information:status.ReadyReplicas
from the managed StatefulSet.status.members
) from thehealthCheck
results.status.phase
based on the observed state during reconciliation (e.g., Creating, Scaling, Running, Degraded, Failed).conditions
andCurrentVersion
are to be implemented.updateStatusIfNeeded
for clarity and consistency in patching the status.defer
statement to callupdateStatusIfNeeded
before theReconcile
function returns, ensuring the latest calculated status is attempted to be patched viar.Status().Patch(ctx, ..., client.MergeFrom(oldStatus))
.How to Test:
make docker-build docker-push IMG=<your-image>
andmake deploy IMG=<your-image>
.EtcdCluster
instance:kubectl apply -f config/samples/operator_v1alpha1_etcdcluster.yaml
(ensurespec.size
> 0 andspec.version
is set).kubectl get etcdcluster <name> -o yaml --watch
.status.readyReplicas
,status.members
, andstatus.phase
are populated and change according to the cluster's lifecycle.Future Considerations:
status.currentVersion
.status.conditions
.status.phase
logic for more accurate representation of intermediate and error states.